Thresholding and vector quantization¶

Image binarization is a common operation. For grayscale images, finding the best threshold for binarization can be a manual operation. Alternatively, algorithms can select a threshold value automatically; which is convenient for computer vision, or for batch-processing a series of images.

Otsu algorithm is the most famous thresholding algorithm. It maximizes the variance between the two segmented groups of pixels. Therefore, it is can be interpreted as a clustering algorithm. Samples are pixels and have a single feature, which is their grayscale value.

In [22]:
%matplotlib inline
import matplotlib
matplotlib.rcParams['image.interpolation'] = 'nearest'
import numpy as np
import matplotlib.pyplot as plt
from skimage import exposure, filters, io , color
In [2]:
ic = io.ImageCollection('FINAL_TRAINING_DATA_SET/*.jpg')
In [3]:
for i, image in enumerate(ic):
    im = color.rgb2gray(image)
                       
    hi = exposure.histogram(im)
    val = filters.threshold_otsu(im)
    fig, axes = plt.subplots(1, 2)
    axes[0].imshow(im, cmap='gray')
    axes[0].contour(im, [val], colors='y')
    axes[1].plot(hi[1], hi[0])
    axes[1].axvline(val, ls='--')
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)

K-means Clustering¶

k-means clustering uses the Euclidean distance in feature space to cluster samples. If we want to cluster together pixels of similar color, the RGB space is not well suited since it mixes together information about color and light intensity. Therefore, we first transform the RGB image into Lab colorspace, and only use the color channels (a and b) for clustering.

In [23]:
from sklearn.cluster import KMeans
In [24]:
ic_seg_images = io.ImageCollection('Segmenting_image_data_set/*.jpg')
In [25]:
for i, image in enumerate(ic_seg_images):
    im_lab = color.rgb2lab(image)
    data = np.array([im_lab[..., 1].ravel(), im_lab[..., 2].ravel()])
    kmeans = KMeans(n_clusters=2, random_state=0).fit(data.T)
    segmentation = kmeans.labels_.reshape(image.shape[:-1])
    fig, axes = plt.subplots(1, 2)
    axes[0].imshow(image)
    axes[1].imshow(image)
    axes[1].contour(segmentation, colors='y')
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)

SLIC algorithm: clustering using color and spatial features¶

In the thresholding / vector quantization approach presented above, pixels are characterized only by their color features. However, in most images neighboring pixels correspond to the same object. Hence, information on spatial proximity between pixels can be used in addition to color information.

SLIC (Simple Linear Iterative Clustering) is a segmentation algorithm which clusters pixels in both space and color. Therefore, regions of space that are similar in color will end up in the same segment.

SLIC is a superpixel algorithm, which segments an image into patches (superpixels) of neighboring pixels with a similar color. SLIC also works in the Lab colorspace. The compactness parameter controls the relative importance of the distance in image- and color-space.

After the super-pixel segmentation (which is also called oversegmentation, because we end up with more segments that we want to), we can add a second clustering step to join superpixels belonging to the same region.

In [26]:
from skimage import segmentation
In [27]:
for i, image in enumerate(ic_seg_images):
    segments = segmentation.slic(image, n_segments=200, compactness=20)
    result = color.label2rgb(segments, image, kind='mean')
    
    
    im_lab = color.rgb2lab(result)
    data = np.array([im_lab[..., 1].ravel(),
                     im_lab[..., 2].ravel()])

    kmeans = KMeans(n_clusters=5, random_state=0).fit(data.T)
    labels = kmeans.labels_.reshape(image.shape[:-1])
    color_mean = color.label2rgb(labels, image, kind='mean')
    
    fig, axes = plt.subplots(1, 2)
    axes[0].imshow(segmentation.mark_boundaries(image, segments))
    axes[1].imshow(segmentation.mark_boundaries(image, labels))
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)
In [28]:
def image_show(image, nrows=1, ncols=1, cmap='gray', **kwargs):
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(5, 5))
    ax.imshow(image, cmap='gray')
    ax.axis('off')
    return fig, ax

for i, image in enumerate(ic_seg_images):
    image_slic = segmentation.slic(image)
    image_show(color.label2rgb(image_slic, image, kind='avg'));
C:\ProgramData\Anaconda3\lib\site-packages\matplotlib\pyplot.py:537: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  max_open_warning, RuntimeWarning)
In [30]:
from skimage import measure
measure.regionprops?
In [ ]: